Shortest triplet clustering: reconstructing large phylogenies using representative sets

نویسندگان

Le Sy Vinh

Arndt von Haeseler

چکیده

BACKGROUND Understanding the evolutionary relationships among species based on their genetic information is one of the primary objectives in phylogenetic analysis. Reconstructing phylogenies for large data sets is still a challenging task in Bioinformatics. RESULTS We propose a new distance-based clustering method, the shortest triplet clustering algorithm (STC), to reconstruct phylogenies. The main idea is the introduction of a natural definition of so-called k-representative sets. Based on k-representative sets, shortest triplets are reconstructed and serve as building blocks for the STC algorithm to agglomerate sequences for tree reconstruction in O(n2) time for n sequences. Simulations show that STC gives better topological accuracy than other tested methods that also build a first starting tree. STC appears as a very good method to start the tree reconstruction. However, all tested methods give similar results if balanced nearest neighbor interchange (BNNI) is applied as a post-processing step. BNNI leads to an improvement in all instances. The program is available at http://www.bi.uni-duesseldorf.de/software/stc/. CONCLUSION The results demonstrate that the new approach efficiently reconstructs phylogenies for large data sets. We found that BNNI boosts the topological accuracy of all methods including STC, therefore, one should use BNNI as a post-processing step to get better topological accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phylogenomic Analysis Using Bayesian Congruence Measuring

Phylogenomic analysis of large sets of molecular characters, primarily DNA and proteins, provides great opportunities to estimate and understand important evolutionary processes. However, molecular phylogenies inferred from individual loci often differ. This incongruence among phylogenies can be the result of systematic error, but can also be the result of different evolutionary histories. We p...

متن کامل

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Here, an algorithm is presented for solving the minimum sum-of-squares clustering problems using their difference of convex representations. The proposed algorithm is based on an incremental approach and applies the well known DC algorithm at each iteration. The proposed algorithm is tested and compared with other clustering algorithms using large real world data sets.

متن کامل

Toward an Efficient Method of Identifying Core Genes for Evolutionary and Functional Microbial Phylogenies

Microbial community metagenomes and individual microbial genomes are becoming increasingly accessible by means of high-throughput sequencing. Assessing organismal membership within a community is typically performed using one or a few taxonomic marker genes such as the 16S rDNA, and these same genes are also employed to reconstruct molecular phylogenies. There is thus a growing need to bioinfor...

متن کامل

On the hardness of inferring phylogenies from triplet-dissimilarities

This work considers the problem of reconstructing a phylogenetic tree from triplet dissimilarities, which are dissimilarities defined over taxontriplets. Triplet dissimilarities are possibly the simplest generalization of pairwise dissimilarities, and were used for phylogenetic reconstructions in the past few years. We study the hardness of finding a tree best fitting a given triplet-dissimilar...

متن کامل

Clustering and Reconstructing Large Data Sets

by Piyush Kumar Doctor of Philosophy in Computer Science Stony Brook University June, 2004

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

BMC Bioinformatics

دوره 6 شماره

صفحات -

تاریخ انتشار 2005

Shortest triplet clustering: reconstructing large phylogenies using representative sets

نویسندگان

چکیده

منابع مشابه

Phylogenomic Analysis Using Bayesian Congruence Measuring

An Incremental DC Algorithm for the Minimum Sum-of-Squares Clustering

Toward an Efficient Method of Identifying Core Genes for Evolutionary and Functional Microbial Phylogenies

On the hardness of inferring phylogenies from triplet-dissimilarities

Clustering and Reconstructing Large Data Sets

عنوان ژورنال:

اشتراک گذاری